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ON AN UNSYMMETRICAL PROBABILITY CURVE. 



BY E. L. DE FOREST. 

[Continued from page 168, Vol. IX.] 

We will now illustrate the applicability of the gamma curve to represent 
series which are not expansions of any known polynomial, but are simply 
the results of repeated observation of some phenomenon or occurrence, in 
which there is a manifest inequality in the distribution of the errors or 
deviations on either side of the mean. Take for example the observations 
given by Quetelet in his Letters already cited, of the amplitude of diurnal 
variation of temperature (centigrade) at Brussels in the month of January, 
as observed for a period of 10 years, from 1833 to 1842. Column (1) of 
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our Table II. shows the various amplitudes, grouped by differences of 1°, 
the smallest one observed being between 1° and 2°, while the largest was 
between 13° and 14°. Column (2) shows the number of days on which 
each of the different amplitudes occurred. The whole number of days of 
observation was 309. The number of days to each amplitude being divi- 
ded by 309, the quot's g are set in column (3). These may be regarded as 
the approximate probabilities that, in any future observation, the corres- 
ponding amplitudes will occur. The sum of all these probabilities is unity. 

In any probability curve, either (52) or (57), the expression for y contains 
dx as a factor, so that we may if we please, regard y not as an ordinate, but 
as the differential Ydx of the area of a curve, that is, the area included be- 
tween any two consecutive ordinates Y, whose abscissas are x — \dx and x 
+ %dx. Under this view, the total area of the probability curve is unity. 
The numbers g in column (3) approximately represent finite sections of this 
area, included between equidistant ordinates, the common interval between 
which is 1°, which we may take as the unit of x. The area of any section 
whose base is the unit of x will be approximately equal, numerically, to the 
middle ordinate of that section, and the approximation is closer, the smaller 
the adopted unit of x is. Thus the numbers g may be regarded as equidis- 
tant ordinates, corresponding to the abscissas 1.5°, 2.5°, &c, which are en- 
tered in column (4). We can construct a gamma curve (52) to represent 
these ordinates, just as if they were the coefficients in the expansion of some 
polynomial, and the curve thus obtained will be a close approximation to 
the true curve of probability. 

First, to find the centre of gravity of the terms in column (3), regarded 
as the masses of material points, multiply each into its lever arm or distance 
from the place of amplitude zero, as given in column (4), and set the result- 
ing moments in column (5). Their sum is 5.197, and dividing this by the 
sum of the masses, which is unity, we get 5.2 nearly as the lever arm of the 
centre of gravity. Subtracting this from the numbers in column (4), we 
have the abscissas x of the masses referred to their centre of gravity, and we 
enter them in column (6). They are the errors of the several observed q'n- 
tities, referred to their arithmetical mean as a standard. The squared q. m. 
error e 2 is found just as in the case of a common probability curve. Multi- 
plying the square of each error by its probability, we set the products in 
column (7). Their sum is £ 2 — 4.647, this being unchanged when divided 
by the sum of the probabilities which is unity. Likewise the cube of the 
c. m. inequality is found by multiplying the cube of each error by its prob- 
ability, setting the result in column (8), and taking their algebraic sum, 
which is £ s = 8.490. Then by (39), the constants in the gamma curve are 



a = % = -ralr = L095 ' 6 = e2 = 4- 647 - ( 64 ) 

Hence a 2 6 = 5.572, and (52) gives K = 1.0151. The adopted unit of 
a; being the common interval between successive ordinates in column (3), 
which interval is represented by dx in (52), we have dx = 1, and 

logy = 1.26081 +4.572 log (1 +.19652*)— .47555a;. (65) 

Giving to x in this equation its values from column (6) of the table, we 
obtain the values of y which are set in column (9). Their sum is unity as it 
should be. They represent the general form of the given series of proba- 
bilities pretty closely, as shown by the differences g — y in column (10). 
The numbers y may be fairly presumed to come much nearer to the true 
probabilities than the numbers g do. 

As we have noticed, the numbers y here are taken to represent the areas 
lying between equidistant ordinates. The probability that an observed am- 
plitude will be between 3° and 6° is therefore approximately 
.183+.200+.171 == .554. 

To find the probability that it will fall between any limits which are 
fractions of a degree, we can make an interpolation by the method which I 
gave in the Smithsonian Report of 1871, p. 309. 

If however it is desired to evaluate rigorously the area of the gamma 
curve between given limits, it will be best to take the equation (25), where 
the origin is at the point in which the curve meets the X axis, and write 
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A known formula, obtained by integration by parts, is 

jv n ~ x e- v dv = — e-»ry-i+(ji— l>"- 2 +(n— 1) (m— 2)v n ~ & + . . . 

. . +(n-l)(n— 2)...2.1]+C, (67) 

where n is supposed to be an integer, otherwise the series will not thus ter- 
minate. We have then, 

PVie-'cfo = tf-V-/ 1+ t^ + (»--lX»- j) + .. + (j^kiJJi I. (68) 
But (66) gives 

fj dx = T^F)f a,{ax ^~ 1 e ~ axd{ax) > 

so that by (51) and (68) we have 

n = a 2 b, v = ax, ~\ 

u »-l,(«-l)(»-2), ,(»-!).. 2.1 ' 
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When i>>n, and n is somewhat large, it makes no difference for our pur- 
poses whether n is an integer or not, because the series in the numerator 
will be so convergent that some of its last terms may be neglected, and if 
this is true for the two nearest integers above and below n, it is also true for 
n, even though it be fractional. The series does not always converge rapid- 
ly, but its terms are easily computed, each from the one that precedes it. 
To insure accuracy, this part of the work should be carried to two more 
places of decimals than are required in the sum of the series. To integrate 
between the limits x x and x. 2 , we take the difference of two integrals from 
x x to cno and from x 2 to oo. 

But when »<«., or when n is small, we can use by preference another for- 
mula, also obtained by integration by parts, 

(V-i e~ v dv = v n e~" \-+ , v ■ + r^rr—fr + &c - \ + °> ( 70 ) 

J \n n(n+l) n{n+ 1) {n+2) J ' v ' 

where n need not be a whole number. Taking this integral between the 

limits and v, C disappears, and we get by (66) as before 

n = a?b, v = ax, 

Sf dx = T^TT) { 1+ ^ VfiIF+2) + &a I' 

or, with the expression for I\n-\-l) = nr(n) from (51), 

r i + _^ +7 __j4__^+&c. 
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When either one of the two integrals (69) and (72) is known, the other 
is known also, because 

C Ydx + f°Ydx = 1. (73) 

Now in Table II. we have, by (64), 

ab = 5.088, n = 5.572. 

To find the probability that a single observed amplitude will fall below 
3° for instance, the upper limit of integration is 

x = 5.088—1.7—0.5 = 2.888, . ■ . v = 3.162, 
and with these values of n and v, (72) gives 

f Ydx = .1413. 
J o 
For the probability that the amplitude will exceed 6°, the lower limit is 

* = 5.088+0.3+0.5 = 5.888, .• . v == 6.447, 

and (69) gives 

f*Ydx = .3111. 



The series was carried only so far as the factors n — 1, n — 2, &e, were 
positive, and as none of the terms were small enough to be neglected, it 
might be doubted whether the result is correct. But when (72) is used, 
with the same value of v, we get 



X* 



Ydx = .6890, 
o 

and .3111 +.6890 = 1 nearly, as it should be, so that the sufficient accura- 
cy of the other result is confirmed. Thus the probabilities that an ampli- 
tude will fall below 3°, or between 3° and 6°, or above 6° are as found by 
integration 

.141, .548, .311, 

and as found by addition of terms in column (9) of the table, 
.138, .554, .308. 

The differences existing are due to the fact that the terms y in the table 
are middle ordinates, while the integration gives areas. The area between 
two ordinates which are separated by a unit interval will be numerically 
a little greater or less than the ordinate at the middle of the interval, accor- 
ding as the curve there is convex or concave toward the X axis. 

The representation of these observations by the computed gamma curve 
might have been made a little more accurate if the 309 observed amplitudes 
had been published and treated separately, instead of being grouped within 
intervals of 1° each. It is of course only an approximation to the truth 
when we take the middle of such an interval as the point whose position rep- 
resents that of all the observations in the group, for the purpose of finding 
the centre of gravity of the whole series, and the deviations from it by which 
we estimate the q. m. error s and the c. m. inequality £, and thence get the 
values of a and b. When the observations are separately given, e 2 is found 
just as in constructing a common probability curve, and £ 3 in like man- 
ner, only taking the cubes of the -f and — errors instead of their squares. 
The unit of x may be chosen at pleasure. 

We might have made small corrections in a and b on account of the fact 
that the errors x in our table are residuals and not true errors. The calcu- 
lation, I think, would be as follows. Any particular true error is the alge- 
braic sum of the residual error and the error of the mean from which the 
residuals are reckoned. The residual error and the error of the mean may 
be treated as approximately independent of each other. Denote by e and £ 
the q. m. error and c. m. inequality for a system of true errors. The 
(q.m. e.) 2 and (c. m. i.) 3 for the residuals are 

[^ 2 ] and M 



where [ ] signifies summation throughout the series. The q. m. error of 
the mean is nearly e-f-j/m., where m denotes 309, the whole number of ob- 
servations. We have then by (62) 
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The approximate c. m. inequality for the mean is £-r-in H according to 
(60), and (63) gives 

C 3 = E^!l + ^- (75) 

iff] m v 

From the above we get, since \_g] = 1, 

Now [<?a; 2 ] and [gx l ~] are the two sums 4.647 and 8.490 of the numbers 
in columns (7) and (8) of our table, so that for the system of true errors we 
have 

e = ®M x 4-647 = 4-662j c s = 95481 x 8490 = 8490> 
o08 9o4oU 

and the corrected values of a and b are by (64) 

a = 1.098, b = 4.662. (77) 

It will be noticed that while e 2 is quite perceptibly larger for true errors 
than for residuals, £ 3 is hardly increased at all. It seems reasonable that 
this should be so, for the residuals are reckoned from the place of the arith- 
metical mean as an origin, and the q. m. error is thereby made a minimum. 
Any change in the place of the origin must increase s. But there is no such 
necessity in the case of £. A change of origin may increase its absolute 
value or may diminish it. According to our formula, the chances are that 
it will be very slightly increased. 

If the observations were separately given, we should find, in like manner, 
first, that the square of the q. m. error is greater for true errors than for 
residuals, in the ratio of m to m — 1 ; and secondly, that the absolute value 
of the cube of the c. m. inequality is greater also, in the ratio of m 2 to m 2 
— 1. The first of these is a well known result. 

In any given set of observations there will usually be some inequality on 
the + and — sides of the mean, even when the real law of error is sym- 
metrical on both sides, so that the asymmetry is purely fortuitous. To de- 
cide whether £ 3 as found from the residuals is fortuitous or not, we shall 
sometimes need to know what its probable value would be, on the assump- 
tion that the true errors are represented by x in the symmetrical curve 

Y = J^e-'™. (78) 

l/7r 



The whole number of possible errors, each taken a number of times pro- 
portional to the probability of its ocucrrence, is represented by 

f° Ydx = A C X e-'^dx = 1. 

J -CO V^ -00 

Hence the mean of the squares, as well as the sum of the squares, of the 
cubes of all the possible errors is represented by 

J"° x s Ydx = - JL- J"° {hx) e e- !fi **d(hx), 

or, putting hx = t and h 2 — l-r-2s 2 , 

aj fl Yife=^f-J t 6 e~ t! dt = We 6 , (79) 

—00 V ^" —00 

the known value of the last definite integral being •^ s -|/;r. (Sturm, Cours 
d! Analyse, II. p. 19.) The probable value of the cube of a single error is 
found approximately by taking the square root of the result in (79) and 
multiplying it by .6745, which gives 

±.6745£ s i/15. 
The probable value of the mean of the cubes of m errors is therefore 

(C 3 ) = ±.6745eV(15-s-m). (80) 

This is a standard which the actual value of £ 3 ought not very much to 
exceed, if the law of error is to be considered symmetrical. 

For example, in the set of observations at p. 495 of Vol. II. of Chauve- 
net's Astronomy, m = 40 and e = .202, and (80) gives 

(£3) _ ±.00340. 
Actually, the algebraic sum of the cubes of the residuals is -.1364, so that 

£3 = "Z^l^ = —.00341. 
40 

Of course such a very close agreement between the actual and the prob'le 
value would not often occur, but in this and other cases, where £ 3 does not 
much exceed (£ s ), we may infer that no real c. m. inequality exists, and that 
the true law of error is probably symmetrical as in (78). 

On the other hand, for the set of observations in our Table II. we have m 
= 309 and e 2 = 4.662, and (80) gives the probable value 

(C 3 ) = ±1.496. 

The actual value is £ s = 8.490, being almost 6 times as great. The chan- 
ces are something like 10000 to 1 against the fortuitous occurrence of an 
error 6 times as great as the probable error. We must infer that, as indeed 
a simple inspection of the observations in this case indicates, the c. m. ineq. 
here is not only apparent, but real ; so that an unsymmetrical curve alone 
can represent the true law of error with reasonable accuracy. 



